WebSockets
Let's learn about the WebSocket protocol and discuss its pros and cons.
We'll cover the following
Motivation#
Most web APIs use HTTP as their underlying protocol to transfer data, and HTTP is often considered as one of the best options for executing batch tasks asynchronously. But when it comes to two-way and real-time communication such as chat, live streaming, gaming, and so on, HTTP falls short because it is a request-response protocol, where usually a server closes the connection after sending the response. We describe some HTTP-based techniques and their corresponding limitations in achieving bidirectional communication in the table below:
Technique | Description | Limitation |
Short polling | Requests frequently for updates from the server after fixed short intervals. The server responds whether it has an update or not. | Sends too many unnecessary requests for updates |
Long polling | Requests for updates from the server with the channel left open (based on some constraints), and the server responds when it has an update | As HTTP follows a request and response model. As a result, it uses multiple concurrent connections for sending data and receiving updates, leading to resource wastage. |
HTTP streaming | HTTP streaming allows servers to stream bytes of data continuously over a single connection to the client while keeping the connection open | It suffers due to half-duplex communication |
Note: The table above focuses on HTTP/1.1 because the other versions were not introduced when WebSocket was first developed.
We conclude from the above discussion that we need a different approach to achieve two-way data transfer between the server and client without waiting for clients' requests. We need an approach that has low latency and avoids TCP handshake by keeping the connection open indefinitely.
What is a WebSocket?#
WebSocket was introduced in 2011 to enable full-duplex asynchronous communication over a single TCP connection to use resources efficiently. HTTP connection restricts TCP to a one-sided communication, where the client always starts the communication due to the request-response model. In other words, the client first sends requests, then the server responds to them, which is a half-duplex communication. In contrast, WebSockets take full advantage of the TCP connection allowing clients and servers to send or receive data on demand.
WebSocket leverages the core TCP channel utilizing its full-duplex nature. Data can be sent and received simultaneously by the client and the server. Websocket is a stateful protocol that performs relatively faster than HTTP because it’s lightweight and carries the overhead of large headers with each request.
Note: The WebSocket protocol is detailed in IETF's RFC 6455, whereas its API documentation maintained by W3C is available here.
A WebSocket establishes an HTTP connection and then upgrades it to the WebSocket protocol. All the transmission happens directly on the TCP channel. The URLs for connections using WebSocket begin with ws:// and wss:// for non-TLS and TLS-based connections, respectively.
The illustration above depicts the conversion from HTTP to a WebSocket connection. We can see that the transmission layer connection (TCP) is the same while the application layer protocol is updated from HTTP to WebSocket.
How does it work?#
A WebSocket connection starts with an HTTP connection established through a three-way TCP handshake. Afterward, an HTTP GET request is initiated to switch the protocol to WebSocket. The connection upgrade request can be accepted or rejected by the HTTP server. If the server is compatible with the WebSocket protocol and the upgrade request is valid, the connection is upgraded to a WebSocket connection. The response contains the status code 101 (Switching Protocols) and the value for the field, Sec-WebSocket-Accept.
HTTP Upgrade headers#
The headers of the initial switching protocol request are shown below:
The Upgrade request
GET / HTTP/1.1
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Key: eB9AWsQe8+SDcwWRjpGSow==
sec-websocket-version: 13
The Upgrade response
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: zglqWZt8l79gwpiFBgKihrobE8I=
For simplicity, we have removed some fields from the headers above.
The headers above contain the following noticeable fields:
Status code
101shows that the protocol is successfully upgraded and can send WebSocket frames.The
Sec-WebSocket-Keyis a base64-encoded 16-byte value that the server uses to verify that the upgrade request is coming from a legitimate client that understands the WebSocket protocol, and not a malformed HTTP request. This value is then encrypted using a hashing algorithm like SHA, MD5, and so on.The server decrypts the value of the
Sec-WebSocket-Keyand generates theSec-WebSocket-Acceptfield by prepending a Globally Unique Identifier (GUID) value to the client-providedSec-WebSocket-Key. The value ofSec-WebSocket-Acceptis also encrypted and sent back to the client, indicating that the server has accepted the connection upgrade.
Quiz
Question 3
How does the client know it requires a WebSocket connection in a particular scenario?
The developers of the client-side application are responsible for initializing the upgrade requests. This is a design decision that has to be made before developing the frontend of the application.
3 of 3
Once the connection is established and successfully upgraded to WebSocket protocol, initial control frames are exchanged. WebSocket has two types of frames, control and data frames. Each frame is identified by a 4-bit opcode. Control frames are used to know the status of the connection and can carry a maximum of 125 bytes of payload. Although, these frames can also be packaged with application data called data frames. A data frame is identified by an opcode whose most significant bit is zero. A list of common frames and their brief descriptions are given below:
Frame | Type | Opcode | Description |
Text | Data | 0x1 | Indicates that the information carried by the frame is plain text |
Binary | Data | 0x2 | Indicates that the information carried by the frame is in binary format |
Ping | Control | 0x8 | Usually sent by the server to check if the client is alive |
Pong | Control | 0x9 | Used to acknowledge a ping frame when the connection is live |
Close | Control | 0xA | Both endpoints exchange close frames for terminating the connection |
When closing a WebSocket connection, close frames are exchanged between endpoints. After an endpoint receives a close frame, it must not send more data. Any metadata stored to maintain the TCP connection information must also be cleaned up during the connection teardown process by the endpoints. Finally, under normal circumstances, the server raises the FIN flag, and both endpoints close the TCP connection when the close sequence is complete.
Advantages#
WebSockets perform well for real-time applications and provide the following benefits:
Bidirectional communication channel
Both server and client can send and receive data on demand
Higher frequency of data exchange
Faster data transmission with a header size of 2—10 bytes
Compatibility with existing infrastructure
Bypasses firewall using the default ports 80 and 443
Disadvantages#
WebSockets is a relatively new concept and not as mature as HTTP-style architectures. While it's great for specific scenarios, such as multiplayer gaming, live streaming, and video conferencing, it also has some limitations.
Horizontal scaling is complex, because we can’t load balance and reroute requests coming from a client once the connection is upgraded to WebSocket.
Greatly affected by connection failures, as the connection is stateful and the request headers carry no information about the sending and receiving ends, it’s difficult to recover the lost connection.
Point to Ponder
Question
Why is it difficult to scale WebSockets horizontally?
Due to the stateful nature of WebSocket, both endpoints are bound to the channel, and we cannot add more machines to reroute requests and distribute the workload among different servers.
While horizontally scaling a single WebSocket connection is difficult, distributing different WebSocket connections to different servers can be achieved by an appropriate intermediary, such as a load balancer or an API gateway.
Remote Procedure Calls (RPCs)
Data Representation and Efficient Communication in APIs